The complex genomic diversity of Yersinia pestis on the long‐term plague foci in Qinghai–Tibet plateau

Abstract Plague is a typical natural focus disease that circulates in different ecology of vectors and reservoir hosts. We conducted genomic population and phylogenetic analyses of the Yersinia pestis collected from the 12 natural plague foci in China with more than 20 kinds of hosts and vectors. Different ecological landscapes with specific hosts, vectors, and habitat which shape various niches for Y. pestis. The phylogeographic diversity of Y. pestis in different kinds plague foci in China showed host niches adaptation. Most natural plague foci strains are region‐and focus‐specific, with one predominant subpopulation; but the isolates from the Qinghai–Tibet plateau harbor a higher genetic diversity than other foci. The Y. pestis from Marmota himalayana plague foci are defined as the ancestors of different populations at the root of the evolutionary tree, suggesting several different evolutionary paths to other foci. It has the largest pan‐genome and widest SNP distances with most accessory genes enriched in mobilome functions (prophages, transposons). Geological barriers play an important role in the maintenance of local Y. pestis species and block the introduction of non‐native strains. This study provides new insights into the control of plague outbreaks and epidemics, deepened the understanding of the evolutionary history of MHPF (M. himalayana plague focus) in China. The population structure and identify clades among different natural foci of China renewed the space cognition of the plague.

Two of them documented on the Qinghai-Tibet plateau are Marmota himalayana plague focus (MHPF, detected in 1954) and Microtus foscus plague natural focus (MFPF, found in 1997) . The former with biovar antiqua Y. pestis pathogens, the latter was biovar microtus. Before the 1990s, most human cases of plague were due to MHPF; since the 1990s, plague epidemics have erupted in southern China (mainly in Yunnan) (Wang et al., 2017); and since 2004, MHPF has reemerged and has been continually active, with outbreaks emerging year after year in conjunction with regional outbreaks, occasionally affecting human beings (Cui et al., 2013;He et al., 2021). Analysis of the spatiotemporal evolution and phylogenetic diversity of Y. pestis circulating in China is important for more effective control of its outbreaks and epidemics. At present, the local microevolution and phylogeography of Y. pestis among different natural plague foci in China remain unclear.
Thus, we combined ecologic and genomic epidemiology to gain insights into the geographic epidemic and local microevolution of plague.

| Yersinia pestis isolate dataset
Eighty-two strains collected from two main hosts: M. himalayana and Microtus were selected from the MHPF to clarify the evolutionary relationship and phylogenetic status of Qinghai-Tibet plateau plague foci of China. In order to define the global population structure and identify clades among different natural foci of China, we obtained a total of 544 Y. pestis with completed genomes or draft genomes from the NCBI database. Altogether, 381 Y. pestis genomes isolated from 14 provinces in China from 1940China from to 2021in addition, 245 Y. pestis strains published by other countries.
DNA from 82 Y. pestis was extracted using the Wizard® Genomic DNA Purification Kit (Promega) according to the manufacturer's instructions (Miller et al., 1988). Library preparation was performed using the Nextera XT Library Prep Kit (Illumina). The libraries were sequenced on an Illumina platform with a minimum of 100-fold coverage.

| Availability of data
The whole genome used in this study and phylogenetic analysis information are available in Table S1.

| SNP calling and phylogenetic analysis
The quality of the raw paired-end reads were evaluated using FastQC (v0.11.8) with the parameters: fastqc -q read_R1.fastq.gz -o output-Dir (de Sena & Smith, 2019). We did quality filter with fastp (version 0.20.1) to provide clean data for downstream analysis: fastp -i in.R1.
Recombinant regions of the full SNP alignment were identified using the software package Gubbins v3.0.0 (Croucher et al., 2015) to exclude the SNPs in recombinant regions for the following analysis.
Trees were visualized using either FigTree v.1.4.4 (https://github. com/ramba ut/figtree) or iTol (Letunic & Bork, 2021). We compared the SNP characteristics of Y. pestis from multiple hosts among different plague foci in China and created a pairwise SNP distance matrix to understand the host-vector-pathogen interactions in association with changes in the genome profile. The packages of ggplot2 (Ito & Murphy, 2013) in R (v3.6.1) were used to draw frequency distribution histograms of SNP distances between samples and intergroup box plots.

| Bayesian population structure analyses
Bayesian analysis with the program STRUCTURE v2.3.4 was used to assign the number of potential clusters present in the SNPs of all the 626 Y. pestis with a Markov Chain Monte Carlo (MCMC) algorithm (Wagner & Zhang, 2011). STRUCTURE was run with a burn-in period of 10,000 and MCMC simulation of 30,000 iterations. The ancestry model was the "admixture model," and the frequency model was "allele frequency correlated". In addition, runs for each cluster (K) = 1-10 were replicated four times. To determine the number of clusters to show how the isolates could be grouped, delta K was calculated using STRUCTURE Harvester Ver. 0.6.93 software (Evanno et al., 2005).

| Pan-genome construction and function analysis of unique genes
To compare of the genetic traits among isolates from the 12 natural foci of China, all 381 genomes of Y. pestis from China were used for pan-genome analysis. The genome sequences were annotated using the Prokka v1.14.5 (Seemann, 2014). We obtained the clustering of all protein sequences, and then characterized the core, accessory, and unique genes with CD-HIT to gain insight into the features of the pan-genome (Fu et al., 2012). The core genome included shared genes among all of the genomes, while the accessory genome contained genes shared by at least two strains; the remaining genes in only one strain were unique genes. The reads mapping were did to align the reads to the unique genome with Bowtie (Langmead & Salzberg, 2012). According to the sequence coverage of the unique genes, we can exclude the possibility of the contaminant. To better understand the functional features of the unique genes, the COG database was used for functional classification. All unique genes were searched against the COG database using BLASTp with an E value of 1e-5 (Tatusov et al., 2000).

| High diversity of Yersinia pestis strains isolated from MHPF
Most of the Y. pestis isolated from MHPF were at the root of the evolutionary tree and scattered over many of the phylogenetic branches.

| Biogeographical distribution of strains from other foci
The genome sequences of other foci were much conserved with one main cluster. Isolates in R. opimus plague focus of the Junggar Basin with 100% of 2.MED1; isolates in Spermoophilus dauricus alaschanicus plague focus of the Loess Plateau of Gansu and Ningxia provinces with 100% of 2.MED3; MFPF with 100% of 0.PE4C; Marmota F I G U R E 3 The maximum-likelihood phylogenetic tree of 626 Yersinia pestis genomes. The phylogenetic tree was constructed based on mutational SNPs across the whole core genome. The tree is colored based on the isolation regions, branch lineages, and natural foci. The strains sequenced in this study were marked in red stars.

| Bayesian population structure analysis
The STRUCTURE analysis splited the Y. pestis into eight genetic groups (Figure 4a) as the highest ΔK value was observed at K = 8 suggesting that it has remained separated in these regions.

| Pan-genome construction and SNP distance among different natural foci
Analysis of this combined set of 626 isolates revealed that 3509 core genes were detected (Figure 5a). The pairwise SNP distance among strains isolated from different plague foci were quite different, the clades of the MHPF had larger SNP distances than other foci (Figure 5b). The SNP distance within the same plague foci was different (Figure 4b, Figure S2). There are two peaks in the pairwise SNP distance distribution, one below 100 and one below 400 SNPs ( Figure 5c). The number of the pan genes increased together with the number of Y. pestis genomic sequences (Figure 5d).
The number of core genes decreased with the increase of strains ( Figure S1). The distribution of unique genes in each natural focus varied widely, varying from 40 to 1206, suggesting a high degree of genetic diversity among the foci. The MHPF had the highest number of unique genes (1206 genes), and the MBPF had the lowest (40 genes) (Figure 5a), which also demonstrated that the former has a higher diversity than the latter. The MHPF genome was heavily enriched in genes related to mobilome, prophages, and transposons; but MBPF genome was heavily enriched in genes related to carbohydrate transport and metabolism. Also, we cannot ignored the differ-

| The influence of the Qinghai-Tibet plateau in the evolution of Yersinia pestis
Plague is a typical natural focus infection disease. It is currently endemic in restricted areas in the world (Dubyanskiy & Yeszhanov, 2016). The bacterial genome has changed over the course of a long evolutionary history, adapting through interactions with hosts and the local environment (Papadopoulos et al., 1999).
This dynamic processes have led to genetic variation in Y. pestis, and it is important to explore this variation and its outcomes to be able to control and/or prevent further plague outbreaks.
In our study, more than 20 kinds of hosts and vectors from different parts of the world were analyzed. Marmota is currently considered the most important reservoir causing human outbreaks and maintaining plague persistence. The Qinghai-Tibet plateau has unique geographic features, and is the highest-altitude plague host area in the world. It is the biggest and one of the most active areas of animal and human plague in China with a large variety of plague animals and strong virulence of plague bacteria (Ge et al., 2015;He et al., 2021;Wang et al., 2011). Moreover, in addition to the main host (marmot), the secondary hosts have appeared from time to time in recent years (Dai et al., 2018(Dai et al., , 2019Ge et al., 2015).

| Geological barriers play an important role in the maintenance of local Yersinia pestis species and block the introduction of non-native strains
The host and vector population dynamics and human movement may increase the chances of exposure and infection to drive the genome evolution (Cui et al., 2020;He et al., 2021).

ACK N O WLE D G E M ENTS
The authors thank Charlesworth author services for their critical editing and helpful comments regarding our article.

CO N FLI C T O F I NTER E S T S TATEM ENT
The authors declare no conflicts of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
Genetic data: Raw sequence reads are deposited in the SRA (BioProject 811346).